Introduction
Usage
Options
Input Data: Fields And Separators
Simple Expressions
Expressions With Arithmetic
Expressions With Functions
Errors During Processing
More Examples
Expression Syntax
The Function Library
Limitations
tcols is a filter for projecting and transforming data columns.
tcols runs from the command line or from batch files.
Input and output data are plain ASCII text lines, each line being treated as (by default, but see -i option) whitespace-separated fields. Files are typically used for input and output data.
For example, consider a text file "data", containing the following table (3 columns, 4 lines):
john 45 tennis al 31 squash tom 25 beer paul 38 women
The command:
tcols from data $3 $2
writes the third and second columns (separated by a tab) to the screen:
tennis 45 squash 31 beer 25 women 38
Here's another example, using the same file "data". The command:
tcols from data to results $1 /loves/ $3.upp
writes the following to the file "results":
john loves TENNIS al loves SQUASH tom loves BEER paul loves WOMEN
The last example shows the use of functions. tcols has functions for string manipulation, formatting, decimal/hex/octal conversions, and a few other things.
The above examples show only a few of tcols's capabilities, so read the next sections for a full description.
Note: All usage examples in this document are for tcols running on MS-DOS. Running tcols on a Unix shell requires quoting appropriate for the particular shell.
Where:
[] denotes an optional item.
Upper/lower case for the 'log', 'from', and 'to' keywords is not significant. Also, these keywords should not be used as file names.
The input data to tcols is ordinary ASCII text lines.
tcols sees
each line as consisting of zero or more fields (denoted $1, $2, ...).
Whitespace-separated fields (default)
tcols sees each field as separated by at least one tab or blank, e.g.:
john 37 butcher (end-of-line) <--> <> <-----> $1 $2 $3
If an input line has no fields (i.e., consists of whitespace only), then tcols will write an empty line to the output, without evaluating the expression(s).
If you want a field to contain whitespace, then the field must be surrounded by single quotes, e.g. 'hey you', or by double quotes, e.g. "hey you".
If you want a single quote inside a singly quoted field, precede it by a backslash, e.g. 'It\'s allright'.
If you want a double quote inside a doubly quoted field, precede it by a backslash, e.g. "She said \"yes\"".
If you want a backslash inside a singly/doubly quoted field, precede it by another backslash, e.g. "a backslash: \\".
If you want a single quote inside a doubly quoted field, no special care is needed, e.g. "It's allright".
If you want a double quote inside a singly quoted field, no special care is needed, e.g. 'She said "yes"'
'' and "" are valid fields.
If tcols finds an unmatched quote on an input line, then tcols reads that quote and the rest of the line as one field. For example:
12 5654 'I feel good 8899 (newline)
<> <--> <------------------------> $1 $2 $3
When tcols reads a quoted field from the input, tcols considers the surrounding quotes part of the field.
Character-separated fields
If you use the -iC option, tcols uses the character C to separate the fields on an input line. In this discussion, we'll consider comma-separated input data.
As an example, this is how tcols -i, would see the following input line:
Al, 42, shoe salesman,married,2, Dodge (newline) <> <---> <------------> <-----> - <------> $1 $2 $3 $4 $5 $6
Any text between the start of the line and the first comma, between two commas, or between the last comma and the end of the line, constistutes a field.
Two commas right next to each other constitute an empty field; this is perfectly legal.
If an input line consists of whitespace only, then tcols will write an empty line to the output, without evaluating the expression(s). Otherwise, whitespace has no special significance when you're using the -i option.
Quotes have no special significance when you're using the -i option.
If you want a comma inside a field, precede it by a backslash: \,
If you want a backslash inside a field, precede it by a backslash: \\
Make sure your input data does not have unwanted spaces at the end of lines!
Fixed-length fields
Use $r (raw line) and the subs function. For example, the command:
tcols -o, from alpha.txt $r.subs(1,8) $r.subs(9,12) $r.subs(13,18)
where the file "alpha.txt" contains the line:
abcdefghijklmnopqrstuvwxyz
yields the following output:
abcdefgh,jklm,opqrst
Expressions specify how tcols should map input data to output data. tcols applies the expressions to each input line in turn, producing a corresponding output line. The only exception is empty input lines (lines that contain only whitespace); they result in an empty line on the output, without being evaluated.
Here are some simple expressions:
$3 : Yields the third field on the input line. $1..4 : Yields the first ... fourth field on the input line. $l : Yields the last field on the input line. $2..l : Yields the second ... last field on the input line. $c : Yields the count of fields on the input line. $r : Yields the entire input line, whitespace and all. 532 : Yields the literal integer 532. /hey/ : Yields the literal string hey.
In general, $N yields the N'th field, for integer N >= 1.
In general, $M..N yields $M'th ... $N'th field, for integers M,N >= 1, M >= N.
In general, $M..l yields the M'th ... last field, for M >= 1.
$c is useful for 'jaggy' tables: tables with an unknown number of fields on each line.
$r is useful when input lines cannot be treated as (e.g.) whitespace separated fields.
In literal strings, use \/ for /, and \\ for \.
Note that other backslash character sequences are not transformed; e.g. \j remains \j.
An integer within //, e.g. /876/, is still regarded as a literal integer.
Example: printing 3rd and 5th fields separated by just a colon:
tcols -o: from myfile $3 $5
Example: swapping the 3rd and 8th column in an 12 column table:
tcols from myfile $1..2 $8 $4..7 $3 $9..12
Syntax errors in expressions will cause tcols to exit with an appropriate error message, before any processing.
An expression should not contain spaces, except in string literals (in which case the whole expression must be surrounded by double quotes, e.g.: "/hi there/".)
The Expression Syntax section describes the exact grammar.
This sections describes how to use tcols's arithmetic operators: + - * / %.
Here are some expressions that show their usage:
$1+$2 : Yields the sum of the first and the second fields 100-$6 : Yields the difference between 100 and the sixth field. $1*$2 : Yields the product of the first and second fields. $3/2 : Yields the third field divided by 2. $1%%10 : Yields the remainder of (the first field divided by 10). (Note 1) -$2 : Yields the second field negated. (Note 2)
Note 1: The extra % is needed to prevent the MS-DOS shell from treating %10 as the 10th command line argument.
Note 2: If you invoke tcols to use standard input/output, and the first expression starts with a '-', then put that first expression in brackets, e.g. (-$2), so tcols doesn't think it's a command line option.
The arithmetic operators work on integers, or on expressions that evaluate to integers.
Shortcuts are possible. For example, the expression:
($2,$3,$1)*10
applied to the input line:
1 2 3
yields the following output:
20 30 10
Note that the right hand side of + - * / and % must evaluate to exactly one number.
Unary - (minus) has the highest precedence, so the following are equivalent:
-$2-4 (-$2)-4
* / and % have equal, and next highest precedence. They're evaluated left to right, so the following are equivalent:
$2*$4/$2 ($2*$4)/$2
+ and binary - have the lowest precedence, and are evaluated left to right, so the following are equivalent:
$1-$2+$5*7 ($1-$2)+($5*7)
Parenthesis, ( ), can be used to override precedence:
($1+$2)*100
This section describes how to form expressions with function calls.
A function call has one the forms:
expression.functionname expression.functionname(arguments)
Here are some example function calls:
$1.suqt : Yields first field with surrounding single quotes removed. $2.clip(3,5) : Yields second field with 3 leftmost and 5 rightmost characters clipped off. $1..5.rjf(8) : Yields first .. fifth fields right justified in fields (sorry!) of 8 spaces.
As a shortcut, expressions can be grouped with ( ) and then fed to a function:
($1,$3,$4,$8).suqt : Yields first, third, fourth, and eighth fields without surrounding single quotes ('').
This saves you from writing:
$1.suqt $3.suqt $4.suqt $8.suqt
Some functions are only meaningful when applied to several expressions:
($1,$4,$7).cat : Yields the concatenation of the first, fourth, and seventh fields.
Function calls can be chained:
$r.subs(1,10).upp : Yields first 10 characters in upper case. $1..l.sum.rjf(10).padl(0) : Yields sum of all fields, right justified in field of 10 characters padded with 0's.
Any expression can be used as a function argument:
$3.rig($1.len) : Yields the N rightmost characters of the third field, where N is the length of the first field.
If a function is given the wrong number of arguments, or the wrong type of arguments, tcols will print error message to standard error (or logfile, if used) and exit. However, if you use the -w command line option, tcols will skip the offending input line, print a warning to standard error (or logfile, if used), and continue processing the next input line; see the Errors During Processing section.
The Function Library section describes all functions and their required arguments.
A processing error occurs if the contents of an input line prevent tcols from evaluating your expressions.
tcols's default error action is to print a relevant error message and exit.
However, if you set the -w command line option, tcols will skip the bad input line and continue processing the next input line. tcols prints a warning anyway.
tcols prints error messages and warnings to standard error (or the logfile, if used).
Here are some typical processing errors:
tcols is rather strict about input data. For example, the sum function will only work on integer arguments, even though I could have made it ignore non-integer arguments. My reasoning is: tcols will often be used for processing hand-typed data. Typists sometimes hit the wrong keys. If tcols were lax about bad input data, it might quietly produce bad output data.
This section gives more examples of complete tcols commands.
These examples start with the file "books" which contains:
Poe 'Edgar Allen' "Selected Stories" 1879 horror Thompson Jim "The Killer Inside Me" 1950 crime Lem Stanislaw "Return From the Stars" 1961 sf Crumley James "Dancing Bear" 1983 crime 'Le Carre' John "Smiley's People" 1972 spy
Now, this file looks a bit messy. You want to reformat it to look cleaner, with first names and surnames together, no single quotes around the names, and no year of publication. The command:
tcols -o from books to books2 "($1.suqt,/, /,$2.suqt).cat.ljf(20)" $3.ljf(25) $5
prints the following to "books2":
Poe, Edgar Allen "Selected Stories" horror Thompson, Jim "The Killer Inside Me" crime Lem, Stanislaw "Return From the Stars" sf Crumley, James "Dancing Bear" crime Le Carre, John "Smiley's People" spy
Allright. To ease future processing, you want your book list on a field-oriented format. The command:
tcols -o from books2 to books3 $r.subs(1,16).trt.dqt.ljf(20) $r.subs(21,43).trt.ljf(25) $l
prints the following to "books3":
"Poe, Edgar Allen" "Selected Stories" horror "Thompson, Jim" "The Killer Inside Me" crime "Lem, Stanislaw" "Return From the Stars" sf "Crumley, James" "Dancing Bear" crime "Le Carre, John" "Smiley's People" spy
Now, you can use another TextTools program, trows, to print all your crime books. The command:
trows from books3 $3=/crime/
prints to the screen:
"Thompson, Jim" "The Killer Inside Me" crime "Crumley, James" "Dancing Bear" crime
Or, you can sort your books on author name, using yet another TextTool program: tsort. The command:
tsort from books3 $1
prints to the screen:
"Crumley, James" "Dancing Bear" crime "Le Carre, John" "Smiley's People" spy "Lem, Stanislaw" "Return From the Stars" sf "Poe, Edgar Allen" "Selected Stories" horror "Thompson, Jim" "The Killer Inside Me" crime
expr ::= list list ::= arit,list | arit arit ::= arit+term | arit-term | term term ::= term*neg | term/neg | neg neg ::= -neg | call call ::= call.funcname(list) | call.funcname | simple simple ::= $M ; M an integer >= 1 | $M..N ; M,N integers >= 1, M <= N | $M..l ; M an integer >= 1 | $l | $c | $r | number | /string/ | (list) number ::= one or more digits (0-9) string ::= one or more printable characters, but use \/ for forward-slash, \\ for backslash
Formatting - Number base conversion - Mathematical - Miscallenous
This section describes all tcols's functions.
E, E1, etc., in this discussion denotes expressions, as far as syntax is concerned, and the result of evaluating expressions as far as evaluation is concerned.
For example, sqt applied to:
hey yields: 'hey' 'hey yields: 'hey' hey' yields: 'hey' 'hey' yields: 'hey' ' yields: '' '' yields: '' hey\' yields: 'hey\''
sqt applied to the empty string yields: ''
For example, suqt applied to:
'hey' yields: hey 'hey yields: hey hey' yields: hey '' yields: the empty string ' yields: the empty string hey\' yields: hey\'
upp does not touch non-letters.
low does not touch non-letters.
' changed to \' " changed to \" \ changed to \\ tab changed to \t newline changed to \n
For example, resc applied to:
'ok' yields: \'ok\' a"b' yields: a\"b\' kh\k yields: kh\\k \'\" yields: \\\'\\\"
(Newlines can only occur as the result of desc applied to a string that contains \n)
\' changed to ' \" changed to " \\ changed to \ \t changed to tab \n changed to newline
desc changes every \xHH (where HH is exactly two hexadecimal digits) to the corresponding ASCII character.
desc changes every \O (where O is one, two, or three octal digits) to the corresponding ASCII character.
desc makes no other changes. For example, \z is not changed to z.
For example:
/ aaa/.trl.sqt yields: 'aaa'
For example:
/aaa /.trt.sqt yields: 'aaa'
For example:
/ aa a /.trt.sqt yields: 'aa a'
For example:
(/a/,/b/,/c/).prf(/#s---#s---#s/) yields: a---b---c
There must be enough E's for the #s's. Extra E's are ignored.
w must be an integer in the range 1 .. 255.
For example:
45.rjf(7).sqt yields: ' 45' 45.rjf(2).sqt yields: '45' 45.rjf(1).sqt yields: '45'
w must be an integer in the range 1 .. 255.
For example:
45.ljf(7).sqt yields: '45 ' 45.ljf(2).sqt yields: '45' 45.ljf(1).sqt yields: '45'
i must be an integer greater than or equal to 0.
If E has less than i characters, E.rig(i) yields E.
Useful for appending the same string to several expressions.
For example:
(4,5,6).app(/.00/) yields: 4.00 5.00 6.00
Useful for prepending the same string to several expressions.
For example:
(2,3,4).pre(/#/) yields: #2 #3 #4
For example:
/istanbul/.rev yields: lubnatsi
Note that rev changes \' to '\, etc.
i and j must be integers greater than or equal to 0.
If the length of E is less than or equal to i + j, then E.clip(i,j) yields the empty string.
For example:
/abcdefg/.clip(2,3) yields: cd
i and j must be integers greater than or equal to 1.
j must be greater than or equal to i.
If i is greater than the length of E, E.subs(i,j) yields
the empty string.
If j is greater than the length of E, E.subs(i,j) yields
characters i .. length-of-E of E.
For example:
/abcdefgh/.subs(3,6) yields: cdef
s must be exactly one character long.
For example:
/ 55/.padl(/0/) yields: 0055
s must be exactly one character long.
For example:
/ok /.padt(/./) yields: ok...
For example:
($2,$3,$1).cat
applied to the input line:
56 john zap
yields:
johnzap56
E must be an integer in decimal form.
For example:
256.d2h yields: 100
If E is negative, the number of hexadecimal digits in the result depends on the type of CPU tcols is run on. (tcols uses the C 'long integer' type for internal number representation.)
E must contain only hexadecimal digits (0..9 a..f A..F).
E must be an integer in decimal form.
If E is negative, the number of octal digits in the result depends on the type of CPU tcols is run on. (tcols uses the C 'long integer' type for internal number representation.)
E must contain only octal digits (0..7).
E1, E2, ... must all be integers.
For example:
/mama/.len yields: 4
If E and f are both integers, they are compared numerically; otherwise they are compared ASCII-wise.
For example:
$1.if(20,/TWENTY/)
applied to the input lines:
20 67 4 0020
yields the following output lines:
TWENTY 67 4 TWENTY
If E and f are both integers, they are compared numerically, otherwise they are compared ASCII-wise.
For example:
$1.ifel(20,/TWENTY/,/other/)
applied to the input lines:
20 67 4 +0020
yields the following output lines:
TWENTY other other TWENTY
For example:
($1,$2,$3).amax
applied to the input line:
lemonade gin port
yields:
port
For example:
$1..l.turn
applied to the input line:
56 4 11 899 66
yields:
66 899 11 4 56
i and j must be integers greater than or equal to 1.
i must be within the count of E1,E2,...
j must be greater than or equal to i.
For example:
$1..l.rng(2,4)
applied to the input line:
56 4 11 899 66
yields:
4 11 899
For example, the command:
tcols -o, from myfile $1 $2.nl $3 $4
applied to the file "myfile" containing:
this is line 1 this is line 2
prints the following to the screen:
this,is line,1 this,is line,2
This section describes tcols's limitations. Normally these limitations won't bother you, but anyway, here they are:
The maximum length of an input line is 255 characters, not counting newline. tcols will exit (with an appropriate error message) on reading an input line that is too long.
The maximum length of the result of an expression, or part of an expression, is 255 characters. If this limit is exceeded, tcols treats this as a processing error.
The range of integers depends on the compiler and CPU used, but you can assume at least -2147483647 ... 2147483647. The C type 'long int' is used for all things numerical. tcols does not detect numerical overflows and underflows, and tcols's behaviour is undefined in such cases.
The maximum total length of literal strings in the expressions is 300 characters. Note that every literal string counts one extra (unseen) character. tcols will exit if this limit is exceeded, which isn't likely.
tcols has internal tables for representing expressions and results of evaluating expressions. These tables are of fixed sizes and may become full, if you use very many/complex expressions. If so, tcols will exit, with an error message. Remedy: run tcols in several passes, using fewer/simpler expressions in each pass.
tcols will print an error message to standard error (or logfile, if used), if any of the above error situations occurs.